Overview


Welcome to my Notebook! In this notebook I aimed to draw attractive visuals by using R and examined some important metrics, which are the key indicators of level of development. Also, I took a broader look at the general trend over years.

If you have a question or feedback, feel free to write and if you like this notebook, please do not forget to UPVOTE 🙂

Get Our Environment Set Up

Return Overview

Before we get started, I set up the environment. In this section, I loaded the libraries and read the data. Briefly, the dataset includes infant mortality rates (per 1,000 live births), adjusted net national income per capita (current US$) and total fertility rate (births per woman) between 1970 and 2016. Lastly, I adjusted the region names.

# Libraries
library(ggalt)                                     
library(dplyr)                                    
library(tidyr)
library(CGPfunctions)  
library(openxlsx) 
library(ggplot2)
library(ggrepel)
library(rworldmap)
library(RColorBrewer)
library(animation)
library(gridExtra)
library(magick)
library(plotly)
library(gganimate)
library(gapminder)

# Turning off warning messages
options(warn=-1)

# get the current directory
current_dir <- getwd()

# Reading xlsx document
MortalityFertilityIncome <- read.xlsx(paste0(current_dir,"/data/MortalityFertilityIncome.xlsx"))
attach(MortalityFertilityIncome)

# Changing the Region names
MortalityFertilityIncome$Region <- factor(MortalityFertilityIncome$Region,
                                          levels = c("Europe & Central Asia",
                                                     "North America",
                                                     "Latin America & Caribbean",
                                                     "East Asia & Pacific",
                                                     "South Asia",
                                                     "Middle East & North Africa",
                                                     "Sub-Saharan Africa"),
                                          labels = c("Europe &\nCentral Asia",
                                                     "North America",
                                                     "Latin America &\nCaribbean",
                                                     "East Asia &\nPacific",
                                                     "South Asia",
                                                     "Middle East &\nNorth Africa",
                                                     "Sub-Saharan\nAfrica"))

Mortality Rate, Infant (per 1,000 live births) & Adjusted Net National Income Per Capita (current US$) & Fertility Rate, Total (births per woman)

Return Overview

In this section, I examined the three important indicators by using different type of charts and selected 2016 as the year but if you want to examine another year, you can do that by doing small changes in the code.

Scatter Plot: Mortality Rate, Infant (per 1,000 live births) & Adjusted Net National Income Per Capita (current US$)


To compare infant mortality rate and income per capita among countries, I preferred a scatter plot and fitted a line by using a simple linear regression and calculated R-squared, which is defined as a statistical measure of how close the data are to the fitted regression line. It seems that the infant mortality rate and income per capita are negatively correlated.

# Calculate R2
mR2 <- summary(lm(m2016 ~ i2016 + log(i2016), data = MortalityFertilityIncome))$r.squared
mR2 <- paste0(format(mR2, digits = 2), "%")

# ggplot
p <- ggplot(MortalityFertilityIncome, 
            aes(x = i2016, y = m2016)) +
  # Draw and color the points
  geom_point(mapping = aes(color = Region),
             shape = 1,
             size = 4,
             stroke = 1.5) +
  # Draw a line fits to data
  geom_smooth(mapping = aes(linetype = "r2"),
              method = "lm",
              formula = y ~ x + log(x), se = FALSE,
              color = "red") +
  # Determine the countries which will have dark labels
  geom_text_repel(mapping = aes(label = Country.Name, alpha = labels),data = transform(MortalityFertilityIncome,labels = Country.Name %in% c("Turkey", "Russia", "Venezuela", "Iraq", "Mayanmar", "Sudan", "Afghanistan", "Congo", "Greece", "Argentinia", "Brazil", "India", "China", "South Africa", "Spain", "Cape Verde","Bhutan", "Rwanda", "France", "Botswana", "France", "US", "Germany", "Britain", "Barbados", "Japan", "Norway", "New Zealand", "Sigapore"))) +
  # Adjusting x axis
  scale_x_continuous(name = "Adjusted Net National Income Per Capita (current US$)",
                     limits = c(0, 70000),
                     breaks = seq(0,70000, by=10000)) +
  # Adjusting y axis 
  scale_y_continuous(name = "Mortality Rate, Infant (per 1,000 live births)",
                     limits = c(0, 90),
                     breaks = seq(0,90, by = 10)) +
  # Scale colors
  scale_color_manual(name = "",
                     values = c("#BDB76B",
                                "#FF8C00",
                                "#28AADC",
                                "#248E84",
                                "#F2583F",
                                "#96503F",
                                "#24576D"),
                     guide = guide_legend(nrow = 1)) +
  # Cancel the alpha legend
  scale_alpha_discrete(guide = FALSE) +
  # Adjust the R2 legend
  scale_linetype(name = " ",
                 breaks = "r2",
                 labels = list(bquote(R^2==.(mR2))),
                 guide = guide_legend(override.aes = list(linetype = 1, size = 2, color = "red"))) +
  # Plot title
  ggtitle("Year: 2016") +
  # Choose a theme
  theme_bw() +
  # Adjust the theme
  theme(panel.border = element_blank(),
        panel.grid = element_blank(),
        panel.grid.major.y = element_line(color = "gray"),
        axis.line.x = element_line(color = "gray"),
        axis.text = element_text(face = "italic", size = 14),
        axis.title = element_text(face = "bold",size = 16),
        legend.position = "top",
        legend.direction = "horizontal",
        legend.box = "horizontal",
        legend.text = element_text(size = 16),
        plot.title = element_text(size = 18, face = "bold"))

ggsave(p, file = paste0(current_dir,"/figures/Figure01.png"), width = 14.5, height = 8.5)
fig1 <- image_read(paste0(current_dir,"/figures/Figure01.png"))
fig1

Scatter Plot: Adjusted Net National Income Per Capita (current US$) & Fertility Rate, Total (births per woman)


As in the previous section, I used a scatter plot to compare infant mortality and total fertility rates among countries. As opposed to the previous plot, it can be seen that the variables are positively correlated.

# Calculate R2
mR2 <- summary(lm(m2016 ~ f2016 + log(f2016), data = MortalityFertilityIncome))$r.squared
mR2 <- paste0(format(mR2, digits = 2), "%")

# ggplot
p <- ggplot(MortalityFertilityIncome, 
            aes(x = f2016, y = m2016)) +
  # Draw and color the points
  geom_point(mapping = aes(color = Region),
             shape = 1,
             size = 4,
             stroke = 1.5) +
  # Draw a line fits to data
  geom_smooth(mapping = aes(linetype = "r2"),
              method = "lm",
              formula = y ~ x + log(x), se = FALSE,
              color = "red") +
  # Determine the countries which will have dark labels
  geom_text_repel(mapping = aes(label = Country.Name, alpha = labels),data = transform(MortalityFertilityIncome,labels = Country.Name %in% c("Turkey", "Russia", "Venezuela", "Iraq", "Mayanmar", "Sudan", "Afghanistan", "Congo", "Greece", "Argentinia", "Brazil", "India", "China", "South Africa", "Spain", "Cape Verde","Bhutan", "Rwanda", "France", "Botswana", "France", "US", "Germany", "Britain", "Barbados", "Japan", "Norway", "New Zealand", "Sigapore"))) +
  # Adjusting x axis
  scale_x_continuous(name = "Fertility Rate, Total (births per woman)",
                     limits = c(0, 8),
                     breaks = seq(0,8, by=1)) +
  # Adjusting y axis 
  scale_y_continuous(name = "Mortality Rate, Infant (per 1,000 live births)",
                     limits = c(0, 90),
                     breaks = seq(0,90, by = 10)) +
  # Scale colors
  scale_color_manual(name = "",
                     values = c("#BDB76B",
                                "#FF8C00",
                                "#28AADC",
                                "#248E84",
                                "#F2583F",
                                "#96503F",
                                "#24576D"),
                     guide = guide_legend(nrow = 1)) +
  # Cancel the alpha legend
  scale_alpha_discrete(guide = FALSE) +
  # Adjust the R2 legend
  scale_linetype(name = " ",
                 breaks = "r2",
                 labels = list(bquote(R^2==.(mR2))),
                 guide = guide_legend(override.aes = list(linetype = 1, size = 2, color = "red"))) +
  # Plot title
  ggtitle("Year: 2016") +
  # Choose a theme
  theme_bw() +
  # Adjust the theme
  theme(panel.border = element_blank(),
        panel.grid = element_blank(),
        panel.grid.major.y = element_line(color = "gray"),
        axis.line.x = element_line(color = "gray"),
        axis.text = element_text(face = "italic", size = 14),
        axis.title = element_text(face = "bold",size = 16),
        legend.position = "top",
        legend.direction = "horizontal",
        legend.box = "horizontal",
        legend.text = element_text(size = 16),
        plot.title = element_text(size = 18, face = "bold"))

ggsave(p, file = paste0(current_dir,"/figures/Figure02.png"), width = 14.5, height = 8.5)
fig2 <- image_read(paste0(current_dir,"/figures/Figure02.png"))
fig2

3D Plot: Mortality Rate, Infant (per 1,000 live births) & Adjusted Net National Income Per Capita (current US$) & Fertility Rate, Total (births per woman)


In order to compare the three metrics together I used 3D plot. Marker color determines the mortality rate. Red means high mortality rate and green means low mortality rate.

fig <- plot_ly(MortalityFertilityIncome, x = i2016, y = m2016, z = f2016,
               hovertemplate = paste('Adjusted Net National Income Per Capita: $%{x:.2f}, <br>',
                                     'Mortality Rate: %%{y:.2f}, <br>',
                                     'Fertility Rate: %%{z:.2f}')) %>% 
  add_markers(color = m2016, colors = c("#4d934d", "#FFA500", "#8b0000")) %>%
  colorbar(title = "Mortality <br>Rate")
fig %>% layout(scene = list(xaxis = list(title = "Adjusted Net National <br>Income Per Capita"),
                            yaxis = list(title = "Mortality Rate"),
                            zaxis = list(title = "Fertility Rate")))

Box Plots: Comparing Regions

Return Overview

I looked at the general levels of the regions in 2016 by using boxplots. The figures show that when the income per capita is high, regions have low infant mortality and fertility rates.

m <- list(
  l = 50,
  r = 120,
  b = 65,
  t = 65,
  pad = 4
)

b1 <- plot_ly(MortalityFertilityIncome, x=~Region, y=m2016, type = "box", color = Region) %>% layout(showlegend = FALSE, autosize = F, width = 930, height = 400)

b2 <- plot_ly(MortalityFertilityIncome, x=~Region, y=i2016, type = "box", color = Region) %>% layout(showlegend = FALSE,autosize = F, width = 930, height = 400)

fig3 <- subplot(b1, b2)
fig3 <- fig3%>% layout(annotations = list(
 list(x = 0.02 , y = 1.1, text = "<b>Mortality Rate, Infant (per 1,000 live births)</b>", showarrow = F, xref='paper', yref='paper'),
 list(x = 1 , y = 1.1, text = "<b>Adjusted Net National Income Per Capita (current US$)</b>", showarrow = F, xref='paper', yref='paper')),
 margin = m
)

fig3
m <- list(
  l = 50,
  r = 120,
  b = 65,
  t = 65,
  pad = 4
)

b3 <- plot_ly(MortalityFertilityIncome, x=~Region, y=f2016, type = "box", color = Region) %>% 
  layout(annotations = list(
 list(x = 0.5 , y = 1.1, text = "<b>Fertility Rate, Total (births per woman)</b>", showarrow = F, xref='paper', yref='paper')), showlegend = FALSE, autosize = F, width = 900, margin = m)

b3

Map Animations from 1970 to 2016 by 5-year Intervals

Return Overview

In this part, I depicted the variables on map animations by using rworldmap and animation libraries in order to compare countries easier.

# Converting data into an object that rworldmap understands
mapped_data <- joinCountryData2Map(MortalityFertilityIncome, joinCode = "ISO3", nameJoinColumn = "Country.Code")
## 214 codes from your data successfully matched countries in the map
## 3 codes from your data failed to match with a country code in the map
## 29 codes from the map weren't represented in your data
# Arrange Margins
par(mar=c(1,0,1,0),xaxs="i",yaxs="i")

# Columns
mortalityears <- c("m1970","m1975","m1980","m1985","m1990","m1995", "m2000","m2005","m2010","m2015","m2016")
incomeyears <- c("i1970","i1975","i1980","i1985","i1990","i1995", "i2000","i2005","i2010","i2015","i2016")
fertilityears <- c("f1970","f1975","f1980","f1985","f1990","f1995", "f2000", "f2005", "f2010","f2015","f2016")
years <- c("1970","1975","1980","1985","1990","1995", "2000", "2005", "2010","2015","2016")

#%%

# Identify colour palette
colourPalette <- brewer.pal(7,'Reds')
# Create the first gif
saveGIF({
  for(i in 1:length(mortalityears)){
    mapMortality <- mapCountryData(mapped_data, 
                                   nameColumnToPlot = mortalityears[i], 
                                   mapTitle = paste("Mortality Rate, Infant (per 1,000 live births) - ",years[i]), 
                                   colourPalette = colourPalette, 
                                   catMethod='fixedWidth', 
                                   addLegend = F, 
                                   oceanCol="lightblue", 
                                   missingCountryCol="white")
    # Adding Legend  
    do.call(addMapLegend, c(mapMortality, 
                            legendLabels="all", 
                            legendWidth=0.5))
  }
}, #interval = 1, 
movie.name = paste0(current_dir,"/figures/Figure05.gif"), ani.width = 1000, ani.height = 600 )
## [1] FALSE
fig5 <- image_read(paste0(current_dir,"/figures/Figure05.gif"))
fig5

# Identify colour palette
colourPalette <- brewer.pal(7,"Greens")
# Create the second gif
saveGIF({
  for(i in 1:length(incomeyears)){
    mapIncome<-mapCountryData(mapped_data, 
                              nameColumnToPlot = incomeyears[i], 
                              mapTitle = paste("Adjusted Net National Income Per Capita (current US$) - ",years[i]), 
                              colourPalette = colourPalette, 
                              catMethod='fixedWidth',
                              addLegend = F, 
                              oceanCol="lightblue", 
                              missingCountryCol="white")
    # Adding Legend  
    do.call(addMapLegend, c(mapIncome,
                            legendLabels="all",
                            legendWidth=0.5))
  }
}, interval = 1, movie.name = paste0(current_dir,"/figures/Figure06.gif"), ani.width = 1000, ani.height = 600)
## [1] FALSE
fig6 <- image_read(paste0(current_dir,"/figures/Figure06.gif"))
fig6

#Identify colour palette
colourPalette <- brewer.pal(7,"Blues")
# Create the third gif
saveGIF({
  for(i in 1:length(fertilityears)){
    mapIncome<-mapCountryData(mapped_data, 
                              nameColumnToPlot = fertilityears[i], 
                              mapTitle = paste("Fertility Rate, Total (births per woman) - ",years[i]), 
                              colourPalette = colourPalette, 
                              catMethod='fixedWidth',
                              addLegend = F, 
                              oceanCol="lightblue", 
                              missingCountryCol="white")
    # Adding Legend  
    do.call(addMapLegend, c(mapIncome,
                            legendLabels="all",
                            legendWidth=0.5))
  }
}, interval = 1, movie.name = paste0(current_dir,"/figures/Figure07.gif"), ani.width = 1000, ani.height = 600 )
## [1] FALSE
fig7 <- image_read(paste0(current_dir,"/figures/Figure07.gif"))
fig7

Examining gapminder Dataset

Return Overview

gapminder is an important dataset which will help us to reach useful insights to determine key indicators of the level of development. First, I import the data and examine it.

str(gapminder)
## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
##  $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

By using a pair plot we can glance the relation among the features. This plot is often a good starting point.

pairs(gapminder)

GDP Per Capita vs Life Expentency from 1952 to 2007

Return Overview

In this part, I examined the GDP Per Capita, Life Expectancy and their yearly changes. Also, we can compare the countries and continents. This animation shows the annual changes and when we hover our mouse on the points, we can see the country names. It looks like GDP Per Capita and Life Expectancy increases by time for most of the countries. Also, generally European Countries have better life ecpectancy.

df <- gapminder 
fig8 <- df %>%
  plot_ly(
    x = ~gdpPercap, 
    y = ~lifeExp, 
    size = ~pop, 
    color = ~continent, 
    frame = ~year, 
    text = ~country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  )
fig8 <- fig8 %>% layout(
    xaxis = list(title = "GDP Per Capita", type = "log"),
    yaxis = list(title = "Life Expentency")
    )
fig8 <- fig8 %>% animation_opts(
    1000, easing = "elastic", redraw = FALSE
  )
fig8 <- fig8 %>% animation_button(
    x = 1, xanchor = "right", y = 0, yanchor = "bottom"
  )
fig8 <- fig8 %>% animation_slider(
    currentvalue = list(prefix = "YEAR ", font = list(color="red"))
  )

fig8

As an alternative the below chart might be used. However, since this is not a dynamic chart, the knowledge you may reach will be limited.

p <- ggplot(
  df, 
  aes(x = gdpPercap, y=lifeExp, size = pop, colour = country)
  ) +
  geom_point(show.legend = FALSE, alpha = 0.7) +
  scale_color_viridis_d() +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  labs(x = "GDP per capita", y = "Life expectancy")

fig9 <- p + facet_wrap(~continent) +
  transition_time(year) +
  labs(title = "Year: {frame_time}")

fig9

To compare life expectancy I preferred the below dumbbell plot. The yearly differences among the European countries can be observed from the below chart.

# subset data
plotdata_long <- filter(gapminder,
                        continent == "Europe" &
                        year %in% c(1952, 2007)) %>%
  select(country, year, gdpPercap)
# convert data to wide format
plotdata_wide <- spread(plotdata_long, year, gdpPercap)
names(plotdata_wide) <- c("country", "y1952", "y2007")
# create dumbbell plot
ggplot(plotdata_wide, 
       aes(y = reorder(country, y1952),
           x = y1952,
           xend = y2007)) +  
  geom_dumbbell(size_x = 2,
                size = 0.8,
                size_xend = 2,
                colour_x = "aquamarine3", 
                colour = "cornflowerblue", 
                colour_xend = "indianred") +
  theme_minimal() + 
  labs(title = "Change of Life Expectancy in Europe",
       subtitle = "1952 to 2007",
       x = "Life Expectancy (years)",
       y = "")

As an alternative I used the below plot and determined the change in the life expectancy for randomly selected countries. Both charts have several advantages.

df <- gapminder %>%
  filter(year %in% c(1992, 1997, 2002, 2007) &
           country %in% c("Turkey","France", 
                          "Germany", "Canada", 
                          "United States","Mexico", 
                          "Brazil", "Venezuela")) %>%
  mutate(year = factor(year),
         lifeExp = round(lifeExp,2)) 
# create slope graph
newggslopegraph(df, year, lifeExp, country) +
  labs(title="Life Expectancy by Country", 
       subtitle="Randomly Selected Countries", 
       caption="source: gapminder")

Continent Comparison

Return Overview

In this section I compare the continents. I start with the life expectancy and continue with the GDP per capita.

df <- gapminder %>% group_by(continent, year) %>% summarise_at(vars(lifeExp), list(lifeExp = mean)) %>%
  mutate(year = factor(year), lifeExp = round(lifeExp,2)) %>% filter(year %in% c(1992, 1997, 2002, 2007))
# create slope graph
newggslopegraph(df, year, lifeExp, continent) +
  labs(title="Average Life Expectancy by Continent", 
       subtitle="",
       caption="source: gapminder")

df <- gapminder %>% group_by(continent, year) %>% summarise_at(vars(gdpPercap), list(gdpPercap = mean)) %>%
  mutate(year = factor(year), gdpPercap = round(gdpPercap,2)) %>% filter(year %in% c(1992, 1997, 2002, 2007))
# create slope graph
newggslopegraph(df, year, gdpPercap, continent) +
  labs(title="Average GDP Per Capita by Continent", 
       subtitle="",
       caption="source: gapminder")

I also wanted to compare the average populations of the continents. By changing mean with sum, you can also observe the continent population.

gm <- gapminder %>% group_by(continent, year) %>% summarise_at(vars(pop), list(pop = mean))
#gm <- aggregate(x = gapminder$pop, by = c(list(gapminder$continent), list(gapminder$year)),FUN = mean) 
p <- ggplot(gm, aes(x = year,
                    y = pop/1e+6, 
                    fill = forcats::fct_rev(continent))) +
  geom_area(color = "black") +
  labs(title = "Continent Population by Year",
       subtitle = "1952 to 2007",
       caption = "source: gapminder",
       x = "Year",
       y = "Average Population in Millions",
       fill = "Continent") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal()

ggplotly(p)

Conclusion

Return Overview

To sum up, the economic inequalities around the globe and the results of this unevenly distributed wealth can be observed from the above charts.

Thank you for reading my notebook and If you liked this notebook, please do not forget to UPVOTE 🙂

If you would like to glance my other notebooks, please CLICK HERE.